Check that all models were downloaded

Check that all the directories for the .nc files got made

source_dl <- dir(here("data_raw", "CMIP6")) source_id <- idx$source_id %>% unique() %>% str_to_lower() %>% str_replace_all("-", "_") stop_if_not(!any(!source_id %in% source_dl))

Check that all the corresponding .csv files exist

csvs <- list.files(here('data')) stop_if_not(!any(!paste0(source_id, "_data.csv") %in% csvs))

Ensure that all models are complete

For this analysis, I only want to use models with pr, tas, hfss, and hfls variables in all 5 scenarios (historical, ssp126, ssp245, ssp370, and ssp585)

Calculations

Perform necessary calculations to compare PET and SPEI among models and between models and observed.

For PET, I’m using the “energy-only” method proposed by Milly and Dune (2016) eq. 8:

\[ PET = 0.8(R_n - G) \]

Except that in their notes, they estimate \(R_n -G\) as hfls + hfss after converting to units of mm/day using the latent heat of vaporazation of water, given by their eq. 2:

\[ L_v(T) = 2.501 - 0.002361T \] in MJ/kg

For the observed data and the CMIP6 data from the same period, I calculate 3-month SPEI using precipitation and PET.

Comparison to Observed Historical

Because SPEI, the variable of interest, is standardized, we looked for models that captured seasonality of precipitation rather than focusing on how well they estimate the exact amounts of precipitation. We calculated mean precipitation for January through December and then calculated a correlation coefficient between the 12 monthly means from each CMIP model and the 12 observed means. We eliminated models with correlation coefficients less than 0.6 for precipitation. Additionally, SPEI takes evapotranspiration into account, specifically through hfls and hfss. Because these variables were not available for observed data, we used temperature but with a less stringent cutoff of correlations greater than 0.4. Additionally we calculated SPEI for each model and counted the number of droughts (SPEI < -1) in each month. However, SPEI was not used to determine which models remained in our ensemble because we did not have strong expectations that accuracy of past SPEI or drought frequency is a good measure of GCM model skill in predicting future SPEI.

Comparison of observed data to CMIP6 'historical' output
Data only from 1980 to 2015 to match observed.
Source Seasonality (monthly means)1 Droughts (SPEI < -1)
pr cor precipitation tas cor temperature mean ± SD duration (months) drought seasonality
observed2 1.00 1.00 3.4±2.8
cas_esm2_0 0.93 0.93 2.1±1.6
fgoals_f3_l 0.94 0.86 2±1.4
awi_cm_1_1_mr 0.80 0.80 2.4±1.3
fgoals_g3 0.80 0.73 2.7±2.1
taiesm1 0.77 0.79 3±2.5
cmcc_esm2 0.72 0.77 2.2±1.5
access_esm1_5 0.68 0.67 2.1±1.9
cmcc_cm2_sr5 0.60 0.71 2.5±2.2
canesm5 0.50 0.37 3.1±2.5
bcc_csm2_mr 0.14 0.63 2.2±1.3
ec_earth3_veg_lr 0.25 0.32 2.6±2
iitm_esm 0.25 −0.04 2.2±1.3
access_cm2 0.05 0.12 2±1.2
cams_csm1_0 0.20 −0.09 2±1.2

1 Red numbers highlight correlations (Pearson's r) < 0.6 for precipitation and < 0.4 for mean temperature.

2 Observed data from Xavier et al. (2016)

Data validation

Check for duplicated dates

Check for overlap in the date ranges of historical experiment and SSPs

Pointblank Validation
[2022-03-02|15:40:44]

tibble df_datesWARN STOP 1 NOTIFY
STEP COLUMNS VALUES TBL EVAL UNITS PASS FAIL W S N EXT
1
 col_vals_gt()

date_min

hist_end

4 4
1.00
0
0.00

2
 col_vals_gt()

date_min

hist_end

4 4
1.00
0
0.00

3
 col_vals_gt()

date_min

hist_end

4 4
1.00
0
0.00

4
 col_vals_gt()

date_min

hist_end

4 4
1.00
0
0.00

5
 col_vals_gt()

date_min

hist_end

4 4
1.00
0
0.00

6
 col_vals_gt()

date_min

hist_end

4 4
1.00
0
0.00

7
 col_vals_gt()

date_min

hist_end

4 4
1.00
0
0.00

8
 col_vals_gt()

date_min

hist_end

4 4
1.00
0
0.00

9
 col_vals_gt()

date_min

hist_end

4 4
1.00
0
0.00

10
 col_vals_gt()

date_min

hist_end

4 4
1.00
0
0.00

11
 col_vals_gt()

date_min

hist_end

4 4
1.00
0
0.00

12
 col_vals_gt()

date_min

hist_end

4 0
0.00
4
1.00

13
 col_vals_gt()

date_min

hist_end

4 4
1.00
0
0.00

14
 col_vals_gt()

date_min

hist_end

4 4
1.00
0
0.00

2022-03-02 15:40:44 EST 12.2 s 2022-03-02 15:40:56 EST

For fgoals_g3 The historical experiment ends December 2016, a year after other models historical experiments. The SSPs start on January of 2016.

fgoals_g3 also has duplicated dates within the historical experiment

fgoals_hist <- 
  bigdf %>% 
  filter(source_id == "fgoals_g3", experiment_id == "historical") 
dupes <- 
  fgoals_hist %>% 
  filter(duplicated(date)) %>% pull(date)
fgoals_hist %>% 
    filter(date %in% dupes) %>% 
  arrange(date)
## # A tibble: 84 × 13
##    source_id experiment_id time                 hfls  hfss    pr   tas tasmax
##    <chr>     <chr>         <dttm>              <dbl> <dbl> <dbl> <dbl>  <dbl>
##  1 fgoals_g3 historical    2015-02-15 05:00:00  59.4  92.4  34.7  31.0   38.7
##  2 fgoals_g3 historical    2015-02-15 05:00:00  59.4  92.4  34.7  31.0   38.7
##  3 fgoals_g3 historical    2015-02-15 05:00:00  59.4  92.4  34.7  31.0   38.7
##  4 fgoals_g3 historical    2015-02-15 05:00:00  59.4  92.4  34.7  31.0   38.7
##  5 fgoals_g3 historical    2015-03-16 16:00:00  69.3  64.8 111.   28.6   35.5
##  6 fgoals_g3 historical    2015-03-16 16:00:00  69.3  64.8 111.   28.6   35.5
##  7 fgoals_g3 historical    2015-03-16 16:00:00  69.3  64.8 111.   28.6   35.5
##  8 fgoals_g3 historical    2015-03-16 16:00:00  69.3  64.8 111.   28.6   35.5
##  9 fgoals_g3 historical    2015-04-16 04:00:00  70.2  59.9 145.   28.6   35.6
## 10 fgoals_g3 historical    2015-04-16 04:00:00  70.2  59.9 145.   28.6   35.6
## # … with 74 more rows, and 5 more variables: tasmin <dbl>, pet <dbl>, cb <dbl>,
## #   spei <dbl>, date <date>

Something went wrong in the wrangling as a result of the overlap between historical and SSPs. Be sure to filter fgoals_g3 to remove the last year from the historical experiment.

Check that values are reasonable in the “historical” experiments

Pointblank Validation
[2022-03-02|15:40:58]

tibble bigdfWARN 0.02 STOP NOTIFY
STEP COLUMNS VALUES TBL EVAL UNITS PASS FAIL W S N EXT
1

Check for reasonable temperature values

col_vals_between()

tas

[10, 45]

28K 28K
1.00
0
0.00

2

Check for reasonable temperature values

col_vals_between()

tasmax

[10, 45]

28K 28K
0.99
9
0.01

3

Check for reasonable temperature values

col_vals_between()

tasmin

[10, 45]

28K 28K
1.00
0
0.00

4

Check for reasonable precip values

col_vals_between()

pr

[0, 400]

28K 28K
0.99
147
0.01

5
 col_vals_not_null()

spei

88K 88K
1.00
0
0.00

6
 col_vals_not_in_set()

spei

-Inf, Inf

88K 88K
0.99
148
0.01

2022-03-02 15:40:58 EST 5.4 s 2022-03-02 15:41:03 EST

All of the failing tests for temperature are access_esm1_5, which has 9 tasmax values above 45ºC (max 48ºC). Most of the failing precipitation rows are also for access_esm1_5, which predicts ~100 months with precipitation > 400mm (max 594mm) in the historical experiment. Infinite values for SPEI are essentially just beyond the range of quantification. bcc_csm2_mr and fgoals_f3_l have the largest number of -Inf values for SPEI.

CMIP model details

Below are plots of all data downloaded from each CMIP6 source.

access_cm2

access_esm1_5

awi_cm_1_1_mr

bcc_csm2_mr

cams_csm1_0

canesm5

cas_esm2_0

cmcc_cm2_sr5

cmcc_esm2

ec_earth3_veg_lr

fgoals_f3_l

fgoals_g3

iitm_esm

taiesm1